New discrete heavy tailed distributions as models for insurance data

Although many data sets are discrete and heavy tailed (for example, number of claims and claim amounts if recorded as rounded values), not many discrete heavy tailed distributions are available in the literature. In this paper, we discuss thirteen known discrete heavy tailed distributions, propose nine new discrete heavy tailed distributions and give expressions for their probability mass functions, cumulative distribution functions, hazard rate functions, reversed hazard rate functions, means, variances, moment generating functions, entropies and quantile functions. Tail behaviour and a measure of asymmetry are used to compare the known and new discrete heavy tailed distributions. The better fits of the discrete heavy tailed distributions over their continuous counterparts as assessed by probability plots are illustrated using three data sets. Finally, a simulated study is performed to assess the finite sample performance of the maximum likelihood estimators used in the data application section.


Introduction
If data are continuous and heavy tailed then the data should be modeled by continuous heavy tailed distributions. By continuous heavy tailed distributions, we mean distributions that have mostly polynomial tails. If data are discrete and heavy tailed then the data should be modeled by discrete heavy tailed distributions. By discrete heavy tailed distributions, we mean discrete versions of continuous heavy tailed distributions.
In reality, many data sets (for example, number of claims and claim amounts often recorded as rounded values) are discrete in nature and have heavy tails. Three examples are given later in Section 5 [1,2]. However, mostly continuous heavy tailed distributions are used to model these data. There are not many discrete heavy tailed distributions in the literature, see [3][4][5][6][7]. Discrete heavy tailed distributions can often provide better fits to data than the continuous counterparts.
The aim of this paper is to review known discrete heavy tailed distributions, propose several new discrete heavy tailed distributions, list their properties and illustrate data applications showing superiority of discrete heavy tailed distributions. Let Y be a non-negative continuous random variable having a heavy tail with cumulative distribution function (CDF) specified by a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 F Y (�). We define a random variable X say having a discrete heavy tailed distribution as one specified by the probability mass function (PMF) j m p X ðjÞ; ½log p X ðxÞ�p X ðxÞ and Q X ðpÞ ¼ dF À 1 X ðpÞe; respectively, for x = 0, 1, . . .. Expressions for mean, variance, MGF and entropy are given mostly as infinite sums. We suppose that these infinite sums converge at least for some of the parameter values; that is, we suppose for at least some of the parameter values. The technique used for constructing the new discrete distributions has been used by [8] to construct models for grouped actuarial data. But the emphasis in [8] is comparison of different estimation methods (a modified maximum likelihood method, a modified generalized method of moments approach, and the traditional maximum likelihood method).
Several of the new discrete heavy tailed distributions are highly flexible in modeling insurance data sets. One of the new discrete heavy tailed distributions has the most flexible tail behaviour, see Table 1 later. Its tail behavior is a product of polynomial and exponential terms. Furthermore, four of the new discrete heavy tailed distributions provided adequate fits to one insurance data with an extremely heavy tail (kurtosis = 1430.076), see Section 5.3 later. None of the known discrete heavy tailed distributions were able to provide an adequate fit to the data in Section 5. 3.
The contents of the paper are organized as follows. Known discrete heavy tailed distributions are listed in Section 2 and new discrete heavy tailed distributions are listed in Section 3. Table 1. Tail behaviours and range of the asymmetry measure for the discrete distributions in Sections 2 and 3.

Distribution
Tail behaviour Asymmetry measure Comparison of the known and new discrete heavy tailed distributions in terms of tail behaviour and a measure of asymmetry is given in Section 4. The better fits of the discrete versions over continuous counterparts are illustrated using three insurance data sets, see Section 5. The data sets are listed in Appendices A, B and C. The method of maximum likelihood is used for estimating parameters of the discrete and continuous heavy tailed distributions. Finite sample performance of the method is assessed in Section 6 via a simulation study. Some conclusions and future work are noted in Section 7.

Known discrete heavy tailed distributions
In this section, we list thirteen known discrete heavy tailed distributions. Their applications can be read from the cited papers and references therein.

Discrete Burr distribution
A random variable X has this distribution [11] if its PMF, CDF, HRF and RHRF are given by p X ðxÞ ¼ y log½1þx a � À y log½1þð1þxÞ a � ; respectively, for x = 0, 1, . . ., a > 0 and 0 < θ < 1. The mean, variance, MGF, entropy and QF of X are logfy log½1þx a � À y log½1þð1þxÞ a � gfy log½1þx a � À y log½1þð1þxÞ a � g and Q X ðpÞ ¼ respectively, for 0 < p < 1. The discrete Pareto distribution is the particular case of the discrete Burr distribution for a = 1.

Discrete extended Weibull distribution
A random variable X has this distribution [13] if its PMF, CDF, HRF and RHRF are given by and respectively, for x = 0, 1, . . ., a > 0, b > 0 and 0 < q < 1. The mean, variance, MGF, entropy and QF of X are

Discrete inverse Weibull distribution
A random variable X has this distribution [14] if its PMF, CDF, HRF and RHRF are given by and r X ðxÞ ¼ q ðxþ1Þ À a À x À a À 1; respectively, for x = 0, 1, . . ., a > 0 and 0 < q < 1. The mean, variance, MGF, entropy and QF of X are respectively, for 0 < p < 1. The discrete inverse Rayleigh distribution due to [15] is the particular case of the discrete inverse Weibull distribution for a = 2.

Discrete log-logistic distribution
A random variable X has this distribution due to [16] if its PMF, CDF, HRF and RHRF are given by respectively, for x = 0, 1, . . ., a > 0 and b > 0. The mean, variance, MGF, entropy and QF of X are respectively, for 0 < p < 1.

Discrete lognormal distribution
A random variable X has this distribution due to [17] if its PMF, CDF, HRF and RHRF are given by respectively, for x = 0, 1, . . ., μ > 0 and σ > 0, where F(�) denotes the standard normal CDF. The mean, variance, MGF, entropy and QF of X are respectively, for 0 < p < 1.

Discrete modified Weibull distribution
A random variable X has this distribution [18] if its PMF, CDF, HRF and RHRF are given by respectively, for x = 0, 1, . . ., a > 0, b > 0 and 0 < q < 1. The mean, variance, MGF, entropy and QF of X are

Discrete reduced modified Weibull distribution
A random variable X has this distribution [19] if its PMF, CDF, HRF and RHRF are given by respectively, for x = 0, 1, . . ., a > 0, b � 1 and 0 < q < 1. The mean, variance, MGF, entropy and QF of X are

Discrete Student's t distribution
A random variable X has this distribution due to [20] if its PMF is given by denotes the imaginary part of ψ(z) = d log Γ(z)/dz and Γ(a) denotes the gamma function defined by The mean, variance and entropy of X are and respectively.

Exponentiated discrete Weibull distribution
A random variable X has this distribution [22] if its PMF, CDF, HRF and RHRF are given by respectively, for x = 0, 1, . . ., a > 0, b > 0 and 0 < q < 1. The mean, variance, MGF, entropy and QF of X are respectively, for 0 < p < 1. The discrete Weibull distribution due to [23] is the particular case of the exponentiated discrete Weibull distribution for b = 1. The discrete Rayleigh distribution due to [24] is the particular case of the exponentiated discrete Weibull distribution for a = 2 and b = 1. The exponentiated discrete Rayleigh distribution due to [25] is the particular case of the exponentiated discrete Weibull distribution for a = 2.

New discrete heavy tailed distributions
In this section, we list nine new discrete heavy tailed distributions. These are based on continuous heavy tailed distributions that have been used as models for insurance data. See [26] for the log Cauchy distribution. See [27] for Lévy distribution. See [28] for the log gamma distribution. See [29] for Fréchet, inverse exponential, paralogistic, inverse paralogistic, transformed gamma, inverse transformed gamma, inverse gamma and generalized Pareto distributions.

Discrete Fréchet distribution
The continuous Fréchet distribution is due to [30]. A random variable X has the discrete Fréchet distribution if its PMF, CDF, HRF and RHRF are given by respectively, for x = 0, 1, . . ., a > 0 and b > 0. The mean, variance, MGF, entropy and QF of X are and Q X ðpÞ ¼ dbðÀ log pÞ À 1=a e; respectively, for 0 < p < 1. The discrete inverse exponential distribution is the particular case of the discrete Fréchet distribution for a = 1.

Discrete generalized Pareto distribution
The continuous generalized Pareto distribution is due to [31]. A random variable X has the discrete generalized Pareto distribution if its PMF, CDF, HRF and RHRF are given by respectively, for 0 < p < 1.

Discrete inverse paralogistic distribution
The continuous inverse paralogistic distribution results from an inverse transformation of a continuous paralogistic random variable. A random variable X has the discrete inverse paralogistic distribution if its PMF, CDF, HRF and RHRF are given by respectively, for x = 0, 1, . . ., a > 0 and b > 0. The mean, variance, MGF, entropy and QF of X are and Q X ðpÞ ¼ dbðp À 1=a À 1Þ À 1=a e; respectively, for 0 < p < 1.

Discrete inverse transformed gamma distribution
respectively, for 0 < p < 1. The discrete inverse gamma distribution is the particular case of the discrete inverse transformed gamma distribution for c = 1.

Discrete Lévy distribution
The continuous Lévy distribution is named after Paul Lévy. A random variable X has the discrete Lévy distribution if its PMF, CDF, HRF and RHRF are given by and denotes the incomplete beta function. The mean, variance, MGF, entropy and QF of X are respectively, for 0 < p < 1.

Discrete paralogistic distribution
The continuous paralogistic distribution is due to [32]. A random variable X has the discrete paralogistic distribution if its PMF, CDF, HRF and RHRF are given by respectively, for x = 0, 1, . . ., a > 0 and b > 0. The mean, variance, MGF, entropy and QF of X Table 1 gives the tail behaviours of the discrete distributions in Sections 2 and 3. Also given in Table 1 is the range of possible values admissible by the discrete distributions of the following asymmetry measure [33,34]: A can take any value in [−1, 1]. We see that the discrete Burr type III, discrete inverse Rayleigh, discrete inverse Weibull, discrete log-logistic, discrete lognormal, discrete Student's t, discrete Fréchet, discrete generalized Pareto, discrete inverse exponential, discrete inverse gamma, discrete inverse paralogistic, discrete inverse transformed gamma, discrete Lévy and discrete paralogistic distributions all have power tail behaviours. Of these the discrete inverse Rayleigh, discrete inverse exponential and discrete Lévy distributions are not so flexible because their power exponents are fixed constants. The discrete extended Weibull, discrete Rayleigh, discrete Weibull, discrete Weibull geometric, exponentiated discrete Rayleigh and exponentiated discrete Weibull distributions all have exponential type tails. Of these the discrete Rayleigh and exponentiated discrete Rayleigh distributions are not so flexible because the power exponents inside the exponentials are fixed constants.
The discrete additive Weibull geometric and discrete reduced modified Weibull distributions have tails that are products of two exponential type terms. The discrete Burr and discrete Pareto distributions have exponential type tails with the exponent taking a logarithmic form. The discrete modified Weibull distribution has a double exponential type tail. The discrete log Cauchy distribution has a logarithmic tail. The discrete log gamma distribution has a power tail multiplied by a power of a logarithm. The discrete transformed gamma distribution has a power tail multiplied by an exponential type tail.
All but the discrete Pareto, discrete Rayleigh, discrete inverse exponential, discrete inverse paralogistic and discrete Lévy distributions can take the full range of possible values of A. The discrete inverse exponential and discrete inverse paralogistic distributions allow for 0 � A � 1 only. The discrete Pareto distribution allows for 0.317 � A � 1 only. The discrete Rayleigh distribution allows for A = 0.075908 only. The discrete Lévy distribution allows for 1/2 � A � 1 only. Hence, the discrete Rayleigh distribution is the least flexible in terms of asymmetry. The discrete Lévy distribution is the second least flexible. The discrete Pareto distribution is the third least flexible. The discrete inverse exponential and discrete inverse paralogistic distributions are the fourth least flexible. The remaining discrete distributions are fully flexible in terms of asymmetry. Fig 1 illustrates possible shapes of the PMFs of discrete Fréchet, discrete generalized Pareto, discrete inverse paralogistic, discrete inverse transformed gamma, discrete Lévy, discrete log Cauchy, discrete log gamma, discrete paralogistic, discrete transformed gamma and discrete lognormal distributions. All of the PMFs are either unimodal or monotonically decreasing. All of the PMFs but those for the discrete transformed gamma distribution decay polynomially, see Table 1. The PMFs for the discrete transformed gamma distribution decay faster because of the presence of the exponential term, see Table 1.

Data applications
In Sections 5.1 to 5.3, we illustrate three applications involving insurance data. In each of these sections, we show that at least four of the discrete distributions in Sections 2 and 3 provided better fits than their continuous counterparts. Many of the other discrete distributions in Sections 2 and 3 provided better fits too. The method of maximum likelihood was used to obtain estimates of the distributions. We maximized the likelihood function directly by using the function optim in the R software (R Core Team, 2022) [35].
The first data set is from [1] and are on the number of insurance claims. The data values are given in Appendix A in S1 Appendix. Some summary statistics are minimum = 0, first quartile = 7.75, median = 37.5, mean = 98.47, third quartile = 105.75, maximum = 877, standard deviation = 174.3104, skewness = 3.252 and kurtosis = 14.101.
The second data set is from [2] and are also on the number of insurance claims. The data values are given in Appendix B in S1 Appendix. Some summary statistics are minimum = 0, first quartile = 9.5, median = 22, mean = 49.23, third quartile = 55.5, maximum = 400, standard deviation = 71.1624, skewness = 2.946 and kurtosis = 12.773.
All three data sets are positively skewed and have heavy tails. We tested heavytailedness of the three data sets using [36]'s test based on Kolmogorov-Smirnov statistic. The p-values were 0.21, 0.23 and 0.38.

Insurance claim data [1]
We fitted six of the discrete distributions in Sections 2 and 3 and their continuous counterparts. The probability plots of the fits are shown in Fig 2.  Fig 1. PMFs of discrete Fréchet (first row, left), discrete generalized Pareto (first row, right), discrete inverse paralogistic (second row, left), discrete inverse transformed gamma (second row, right), discrete Lévy (third row, left), discrete log Cauchy (third row, right), discrete log gamma (fourth row, left), discrete paralogistic (fourth row, right), discrete transformed gamma (fifth row, left) and discrete lognormal (fifth row, right) distributions. https://doi.org/10.1371/journal.pone.0285183.g001 The fits of the continuous versions do not appear reasonable for any of the six distributions. But the fits of the discrete versions appear reasonable for all six distributions. Their plotted points are much closer to the diagonal lines. Table 2 showing the deviations between the observed and expected probabilities confirms better fits of the discrete versions. Note that the discrete paralogistic distribution gives the smallest sum of squares while the discrete inverse gamma distribution gives the largest sum of squares.

Insurance claim data [2]
We fitted six of the discrete distributions in Sections 2 and 3 and their continuous counterparts. The probability plots of the fits are shown in Fig 3. Again the fits of the continuous versions do not appear reasonable for any of the six distributions. The fits of the discrete versions appear reasonable for all six distributions excluding the discrete inverse gamma distribution. Table 3 showing the deviations between the observed and expected probabilities confirms better fits of the discrete versions. Note that the discrete Burr distribution gives the smallest sum of squares while the discrete inverse gamma distribution gives the largest sum of squares.

Travel insurance data
We fitted four of the discrete distributions in Section 3 and their continuous counterparts. The probability plots of the fits are shown in Fig 4. Yet again the fits of the continuous versions do not appear reasonable for any of the four distributions. The fits of the discrete versions appear reasonable for all four distributions. Table 4 showing the deviations between the observed and expected probabilities confirms better fits of the discrete versions. Note that the discrete inverse Burr distribution gives the smallest sum of squares while the discrete inverse transformed gamma distribution gives the largest sum of squares. None of the known discrete heavy tailed distributions gave a reasonable fit for this data set.

Simulation study
In this section, we assess the finite sample performance of the maximum likelihood estimators of discrete heavy tailed distributions in terms of biases and mean squared errors. The following simulation study was used.
1. choose the discrete Weibull distribution, the particular case of the exponentiated discrete Weibull distribution in Section 2.13 for b = 1; 2. set initial values as a = 2 and q = 0.5; 3. simulate a random sample of size n from the discrete Weibull distribution by using the QF in Section 2.13 for b = 1; 4. compute the maximum likelihood estimates of a and q for the sample in step c); 5. repeat steps c) and d) a thousand times, giving the estimatesâ i andq i for i = 1, 2, . . ., 1000;  8. repeat steps c)-g) for n = 10, 11, . . ., 100.
Plots of the biases and mean squared errors versus n = 10, 11, . . ., 100 are shown in Fig 5. We can observe the following from the figure: a) the biases generally decrease in magnitude to zero with increasing n; b) the biases forâ appear generally positive; c) the biases forq appear positive or negative; d) the biases forâ appear larger than those forb; e) the biases appear small enough for all n � 80; f) the mean squared errors generally decrease to zero with increasing n; g) the mean squared errors forâ appear larger than those forb; h) the mean squared errors appear small enough for all n � 80.
The observations noted are for particular parameter values and for a particular discrete heavy tailed distribution. We have not presented results for other choices in order to save space and avoid repetitive discussion. But the same observations held for a wide range of parameter values and for all of the discrete heavy tailed distributions in Sections 2-3. In particular, the biases always decreased to zero with increasing n, the mean squared errors always decreased to zero with increasing n, the biases always appeared small enough for all n � 80, and the mean squared errors always appeared small enough for all n � 80.

Conclusions
We have studied discrete heavy tailed distributions, taken as discrete versions of continuous heavy tailed distributions. We have reviewed thirteen known discrete heavy tailed distributions and introduced nine new discrete heavy tailed distributions. We have given expressions for the PMF, CDF, HRF, RHRF, mean, variance, MGF, entropy and QF for each of the discrete heavy tailed distributions.
We have compared the known and new discrete heavy tailed distributions by tabulating their tail behavior and a measure of asymmetry. The tail behavior was either a polynomial or a polynomial multiplied by a logarithmic term or a polynomial multiplied by an exponential term for the new discrete heavy tailed distributions. All but three of the new distributions allowed the measure of asymmetry to take the full range of possible values.
We have shown that the discrete heavy tailed distributions can provide better fits than continuous versions to three real data sets on insurance payments. The better fits were assessed in  terms of probability plots. The observed and expected probabilities were closer to each other when the discrete heavy tailed distributions were fitted. We have assessed the finite sample performance of the maximum likelihood estimators of the discrete heavy tailed distributions by a simulation study. The study showed the biases and the mean squared errors of the estimators appeared small enough for all sample sizes greater than or equal to 80. Two of the three data sets (see Sections 5.1-5.2) have sample sizes less than 80. Hence, the conclusions in Sections 5.1-5.2 should be treated conservatively.
Future work are to study discrete heavy tailed distributions for bivariate, multivariate, matrix variate and complex variate data. These distributions could be based on bivariate t distributions, multivariate t distributions, bivariate skew t distributions, multivariate skew t distributions, bivariate generalized hyperbolic distributions, multivariate generalized hyperbolic distributions, mixtures of these distributions, and others. Further discrete heavy tailed distributions can be constructed using known mechanisms for generating continuous heavy tailed distributions, see, for example, [37]. Moreover, viewing discrete data as a discretization of continuous data, we can build latent variable models and compare them to the new discrete distributions.